Improved spelling recognition using a tree-based fast lexical match
نویسندگان
چکیده
This paper addresses the problem of selecting a name from a very large list using spelling recognition. In order to greatly reduce the computational resources required, we propose a tree-based lexical fast match scheme to select a short list of candidate names. Our system consists of a free letter recognizer, a fast matcher, and a rescoring stage. The letter recognizer uses n-grams to generate an n-best list of letter hypotheses. The fast matcher is a tree that is based on confusion classes, where a confusion class is a group of acoustically similar letters such as the e-set. The fast matcher reduces over 100,000 unique last names to tens or hundreds of candidates. Then the rescoring stage picks the best name using either letter alignment or a constrained grammar. The fast matcher retained the correct name 99.6% of the time and the system retrieved the correct name 97.6% of the time.
منابع مشابه
Lexical orthography acquisition: Is handwriting better than spelling aloud?
Lexical orthography acquisition is currently described as the building of links between the visual forms and the auditory forms of whole words. However, a growing body of data suggests that a motor component could further be involved in orthographic acquisition. A few studies support the idea that reading plus handwriting is a better lexical orthographic learning situation than reading alone. H...
متن کاملDesign and implementation of Persian spelling detection and correction system based on Semantic
Persian Language has a special feature (grapheme, homophone, and multi-shape clinging characters) in electronic devices. Furthermore, design and implementation of NLP tools for Persian are more challenging than other languages (e.g. English or German). Spelling tools are used widely for editing user texts like emails and text in editors. Also developing Persian tools will provide Persian progr...
متن کاملSpeed improvement of the time-asynchronous acoustic fast match
This paper describes an algorithm for improvement of the speed of a time-asynchronous fast match, which is a part of a stack-search based recognition system. This fast match uses a phonetic tree to represent the entire vocabulary of the recognizer. Evaluation of the tree (in a depthrst manner), can be done much more e ciently using the fact that under certain conditions, the results of branch e...
متن کاملSimSem: Fast Approximate String Matching in Relation to Semantic Category Disambiguation
In this study we investigate the merits of fast approximate string matching to address challenges relating to spelling variants and to utilise large-scale lexical resources for semantic class disambiguation. We integrate string matching results into machine learning-based disambiguation through the use of a novel set of features that represent the distance of a given textual span to the closest...
متن کاملSpelling consistency affects reading in young Dutch readers with and without dyslexia.
Lexical-decision studies with experienced English and French readers have shown that visual-word identification is not only affected by pronunciation inconsistency of a word (i.e., multiple ways to pronounce a spelling body), but also by spelling inconsistency (i.e., multiple ways to spell a pronunciation rime). The aim of this study was to compare the reading behavior of young Dutch readers wi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999